Hugging Face快速入门

您所在的位置:网站首页 hugging face tokenizer Hugging Face快速入门

Hugging Face快速入门

2023-02-26 14:49| 来源: 网络整理| 查看: 265

Hugging Face快速上手

参考视频: Getting Started With Hugging Face in 15 Minutes https://www.google.com/search?q=Huggingface+tutorial&oq=Huggingface+tutorial&aqs=chrome…69i57j0l5.9301j0j8&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:61ceeebe,vid:QEaBAZQCtwE

Hugging Face transformer 库是目前最受欢迎的机器学习库,可以帮助初学者快速上手

Tansformers安装

需要先安装好 Pytorch或TensorFlow2.0或Flax pip install transformers

1. PipeLine-一些可以直接拿来调用的功能模块流程

具体调用方法见下面的例子 from transformers import pipeline 第一个例子使用了情感分析流程的默认模型

from transformers import pipeline classifier = pipeline("sentiment-analysis") # default model res = classier("I'd love to learn the HuggingFace course") print(res)

第二个例子使用了指定的模型

from transformers import pipeline generator = pipeline("text-generation",model="distilgpt2") res = generator( "In this workshop, we will help you finish", max_length=30, num_return_sequences=2, ) print(res) #[{'generated_text': 'In this workshop, we will help you finish your book with just a few clicks of a click of a click of an important key. Also, we'}, #{'generated_text': 'In this workshop, we will help you finish the program. The workshop includes a list of exercises, a guide to learning, and a tutorial about techniques'}]

开源的pipeline功能可以到官网查看:https://huggingface.co/docs/transformers/pipeline_tutorial

2. Modle/Tokenizer from transformers import pipeline from transformers import AutoTokenizer,AutoModelForSquenceClassification,BertTokenizer,bertModel model_name = "distilbert-base-uncased-finetuned-sst-2-english" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = Autokenizer.from_pretrained(model_name) classifier = pipeline("sentiment-analysis",model=model,tokenizer=tokenizer) res = classifier("I love this movie") print(res)

理解一下tokenizer

from transformers import pipeline from transformers import AutoTokenizer,AutoModelForSquenceClassification,BertTokenizer,bertModel model_name = "distilbert-base-uncased-finetuned-sst-2-english" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = Autokenizer.from_pretrained(model_name) classifier = pipeline("sentiment-analysis",model=model,tokenizer=tokenizer) sequence="hello,world" res = tokenizer(sequence) print(res) tokens = tokenizer.tokenize(sequence) print(tokens) ids = tokenizer.convert_tokens_to_ids(tokens) print(ids) decoded_string = tokenizer.decode(ids) print(decoded_string) 3. Pytorch/TF res = tokenizer(sequence,padding=True,truncation=True,max_length=512,retrun_tensors="pt")#pytorch form

在需要训练/精调时可以使用

4. Save/Load save_directory = "saved" tokenizer.save_pretrained(save_directory) model.save_pretrained(save_directory) tok = AutoTokenizer.from_pretrained(save_directory) mod = AutoModelForSequenceClassification.from_pretrained(save_directory) 5. Model Hub

在官网页面查看各种任务与模型:https://huggingface.co/models 模型页面名字后面的copy按钮可以一键复制模型名字

6. Finetune 准备数据载预训练的Tokenizer->编码数据构建Pytorch数据集加载预训练的模型加载Trainer并初始化然后训练 from transfoemers import Trainer,TrainingArguments training_args = TrainingArguments("test-trainer") trainer = Trainer(model, training_args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["validation"], data_collator=data_collator, tokenizer=tokenizer, ) trainer.train()


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3